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Abstract. Within the field of numerical multilinear algebra, block tensors are increasingly 
important. Accordingly, it is appropriate to develop an infrastructure that supports reasoning about 
block tensor computation. In this paper we establish concise notation that is suitable for the analysis 
and development of block tensor algorithms, prove several useful block tensor identities, and make 
precise the notion of a block tensor unfolding. 
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1. Introduction. The field of matrix computations has matured to the point 
that it is not necessary to provide scalar-level verifications of basic block-level opera- 
tions. For example, if 
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then without "ijfc proof" it is understood that C12 — AJ1B12 + A21B22 provided 
A and B are partitioned conformally. "Understandings" like this contribute to the 
culture of block matrix computations, enabling researchers to think at a high level 
when they are developing new algorithms and proofs. 

It is our contention that the emerging field of tensor computations needs to de- 
velop a similar infrastructure that gracefully supports block tensor operations. By a 
block tensor we mean a tensor whose entries are themselves tensors. As with ma- 
trices, the act of blocking a tensor is the act of partitioning the index range vectors 
associated with each dimension. Thus, if ^ G j^9x5x8 
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then we are choosing to regard A as a 3-by-2-by-4 block tensor with block dimensions 
that are determined by the indicated partitionings of 1:9, 1:5, and 1:8. The colon 
notation can be used to specify the blocks. For example, the (2,1,3) block .4.213, is 
prescribed by .4(3:5, 1:3, 5:6). 

Block tensors are increasingly important for the same reasons that block matrices 
are increasingly important: 

Structure. Block-level sparsity is a common pattern because of nearest- 
neighbor coupling and other reasons [15 . 

Generalization. Block versions of point algorithms frequently have attractive 
features [M]. 

Performance. Blocking is the key to minimizing the overhead of communica- 
tion 1 . 
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A-iii = ^{6:9, 1:3, 1:2) 



A213 =yt(3:5, 1:3,5:6) 



Av2i =,A{1:2,4:5, 7:8) 
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Fig. 1.1. A vec-ordered, mode-1 unfolding of A d j^SxSxS ^jj^ blocking (1.1) 
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Fig. 1.2. A "block vec"-ordered, mode-1 unfolding of A € ]f{_9x5x8 ^j^^ blocking (1.1) 



Indeed, there is a very strong coupling between block tensor computations and block 
matrix computations. This is because the dominant paradigm for tensor computation 
involves the device of unfolding. An unfolded (or flattened) tensor is a matrix obtained 
by systematically reorganizing the tensor's entries into a 2-dimensional array. In this 
framework, computations on a tensor A reduce to matrix computations on one or 
more of its unfoldings. For example, the higher-order singular value decomposition of 
a tensor involves computing the SVD of each modal unfolding [4] . See [13] for a nice 
overview of tensor decompositions and unfoldings. 

Given all the advantages that result when a matrix computation is organized at 
the block level, it makes sense for an unfolding of a block tensor A to have a related 
block structure of its own. In particular, ^'s blocks should map to contiguous blocks in 
the unfolding. This is not the case when a typical "vec-oriented" unfolding is invoked 
[13]. Consider the mode-1 unfolding A(^i) of a 9-by-5-by-8 tensor A with blocking 
(1.1). The unfolding, which is displayed in Fig 1.1, is a 9-by-40 matrix whose ?-th 
row is vec(^(i, :, :))^. (Recall that vec-of-a-matrix is the vector obtained by stacking 
its columns.) Notice that in the unfolding, ^'s flattened blocks are not contiguous. 
The primary purpose of this paper is to show how to permute the rows and columns 
of a vec-oriented unfolding so that its blocks are unfoldings of the tensor blocks. An 
example of such an unfolding is displayed in Fig 1.2. 

The paper is organized as follows. In §2 we review well-known connections be- 
tween vec(-), Kronecker products, transposition, and the perfect shuffle permutation. 
A block version of vec(-) is defined in §3 and a related permutation is used to define 
the notion of a block unfolding. In §4 we show how to formulate a tensor contraction 
as a block matrix multiplication using the tools developed. 
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2. Basic Notation and Operations . If ^ e j^^ix xn^ j ^ (ii,...,^^), 
then ^(i) denotes component (ii, . . . ,id) of tensor A. We use calligraphic characters 
to designate tensors and bold lower case characters to denote vectors of integers. For 
^(i) to make sense we must have 1 < ifc < for fc = l:d, i.e., 1 < i < n. In general, 
if i and j have equal length, then i < j means that ik < jk for all k. 

The Matlab colon notation is used to specify index ranges. If a < 5 and c > 0, 
then a:b is the vector [a, a+ 1, . . . , 6] and a:c:b is the vector [a, a + c, a + 2c, . . . , a + mc] 
where m = [{b — a)/c\ , i.e. the largest integer that is less than or equal to (6 — a)/c. 

If A e K"^" and B G then the Kronecker product A(g}B e ]R™px"9 is the 

block matrix 



aiiB 



ai„B 



^rnnB 



The outer product C = ^ o ^ of a tensor A G R^''' '''^'' and a tensor B eR'"' 
is a tensor C £ ^nx-xu>^k,x-xk, defined by 



C{i)=A{i{l:d))-B{i{d+l:d + i 



l<i<[jk] 



The order of ^ o ;B is the order of A plus the order of B. Note that A <8) S is an 
unfolding of the order-4 tensor A o B where A and B are order-2 tensors (matrices) 
A and B. 

2.1. The Vec Operation and Ordering. If ^ e ]r:"ix x"<i andiV = ni ■■■na, 
then vec(^) e H" is a column vector defined recursively by 



vec{A) = 

where A^'^^ is the order-((i — 1) tensor 

A'^''\ii,...,id-i) ^ A{ii, 



vec(^(i) 
vec(^("'' 



(2.1) 



,id~i, k) I <k <nd. 



(2.2) 



It is assumed that 1 < \{l:d — 1) < n(l:(i — 1). If d = 1, then ^ is a column vector 
and vec(^) = A. If d = 2, then ^ is a matrix and vec(^) stacks its columns. Each 
entry in tensor A G R^^^ '^^d corresponds to a component of vec(^). This implicitly 
defines an index mapping function ivec{ • , n): 



iwec(i, n) = ii + {12 - l)n-i + {h ~ l)"-in-2 H ^ {id 

It is easy to show that if w = vec(^), then 

^2t;ec(i,n) "^C^) 



■nd-i- 



(2.3) 



(2.4) 



for all i that satisfy 1 < i < n. 

It should be noted that the "tensor vec" operation given by (2.1)-(2.4) reverts to 
the standard vec operation when ^ is a matrix 
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2.2. Transposition, Vec, Kronecker Products, and Permutation. There 
is an important connection between matrix transposition and perfect shuffle permu- 
tations [HI [HI [HI- In particular, if ^ e and s = qr, then 



vec(A^) = nJ^vec(A) 
where II^^j. G JK'^" is the {q,r) perfect shuffle permutation defined by 



(2.5) 



z{l:r:s) 
z{2:r:s) 

z{r:r:s) 



2 e 



(2.6) 



See [12]. If Z e IT""^ and y = Z^, then vec(r) = ng^^vec(Z). It is easy to verify 
that — Ilr.g- 

If / e M'' and g G , then 5 (8 / is a perfect shuffle of / 5: 

n,,.(/®ff) = g®f. (2.7) 

An important consequence of this result applies to the case when g is a block vector: 



(2.8) 



Here, gi G M''' and r- = pi + • • • + p^. 

Tensor transposition can also be characterized in terms of vec(-) and perfect 
shuffles. If ^ G ]R"ix -x"<' and p is a permutation of V.d, then ^<p> G ]R"pi ^■■■^"■'d 
denotes the p-transpose of A and is defined by 
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■^^^^{ipi,---,ipd) = A{ii,...,id) 1 < i < n. 



(2.9) 



i.e., ^<P^(i(p)) — A{i). The following lemma can be regarded as a generalization of 
(2.5): 

Lemma 2.1. //^ e ]R"ix"2x"3xiV4 and B ^ A<^^^^> , then 



vec(S) = (/„, ®n„3,„, (»/„Jvec(^). 



Proof. The proof follows from well-known facts that relate Kronecker products, 
vec(-), and the perfect shuflie. See [3[8llI3- □ 

Although Lemma 2.1 addresses an order-4 transposition, the result can be applied to 
tensors of arbitrary order simply by "fusing" adjacent modes. For example, suppose 
C G ]R"i><- -x"7 g^j^jj jy^ ^ ni7i2, N2 = na, N3 = 714715, and N4 = hqUt. Define 

A G TEe'^>'-"2'<«3y~«4 by 



•A{h,j2,h,ji) = C(i) where 




= wec(i(l:2),n(l:2)) 

— TOec(i(3:3), n(3:3)) 

= TOec(i(4:5),n(4:5)) 

= wec(i(6:7), n(6:7)) 
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Observe that vec(^) — vec(C) and 

= vec(^<[13 24]>) ^ vec(C<[12 45 3 6 7]>)^ 

Two special applications of Lemma 2.1 are worth noting. Assume A G ]pj"i^"^"'i. 
If p = , fc + 1 , fc, fc + 2:d], then 

vec(^<P>) - ®n„,^,,„, ®/„Jvec(^) (2.10) 

where iVi = n\ - ■ ■ nk-i and A'4 = nk+2 ■ • • ^^d- This transposition swaps two adjacent 
modes, e.g., 

B ^ A^^^^^^'^^^ ^ A{ii,i2,i3,H,i5) - B{ii,i2,i4,i3,i5)- 

On the other hand, if p = [k , l:k — 1 , fc + l:d], then 

vec(^<P>) = «)njv,,„Jvec(^) (2.11) 

where N2 — ni ■ ■ ■ Uk^i and A^4 = Uk+i ■ ■ ■ Ud- This transposition "moves" a desig- 
nated mode "to the front," e.g., 

B = A^^^^^^'^^^ ^ A{ii,i2,i3,H,i5) = ^(^3, «i, ^2, ^4, «5)- 

2.3. Unfolding a Tensor. Converting a tensor to a matrix is an important oper- 
ation in tensor computations [9l [lOl [TTl US] • In order to unfold a tensor A G 
into a matrix, it is necessary to choose (a) an integer e that satisfies 1 < e < d and (b) 
a permutation p of l:d. If 

r = p(l:e) (2.12) 

c = p(e + l:d) (2.13) 
then the r x c unfolding of A is the matrix ^rxc whose (a, (3) entry is given by 

Axc(a,/3) = A<P>{il,...,te,jl,...,jd-e) (2.14) 

where 

a = ivec{i, n(r)) 1 < i < n(r) (2-15) 

^ = wec(j,n(c)) l<j<n(c). (2.16) 

Note that Arxc has • • • Up^ rows and rip^^^ • • • rip^ columns. Each row and 
column of ^rxc is the vec of a reduced-order subtensor. In particular, for all i and j 
that satisfy 1 < i < n(r) and 1 < j < n(c), we have 

Axc( ^vec{i, n(r)), : ) = vec( U^'Y (2-17) 

Axc(:,«t'ec(j,n(c))) =vec(C(J)) (2.18) 
where the tensors 7^^'-' and C'j' are defined by 

i?«(j) = A<P>iiu . . . , ie,.n, ■ . ■ ,Jd-e) (2.19) 

C«)(i) = A<P>{h, . . . ,Ze,Jl, . . . ,Jd-e). (2.20) 

Especially important are the modal unfoldings. If p = [k l:fc— 1 k+l:d], then ^rxc is 
a mode-k unfolding of A. The columns of this matrix are referred to as mode-e fibers 
of A. Special conventions are required if A is to be unfolded to either a column or 
row vector. If e = d, then c = and ^rxc = vec(^). Likewise, if e = 0, then r = 
and ^rxc = vec(^)-^. 
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2.4. Special Cases. The preceding results take on a special form when A is a 
rank-1 tensor. Suppose A — a*^^' o • • • o a'^'*' where a^*"'' e R"'' for k = 1, . . . ,d, i.e., 

A{ii,...,td) = aW(zi)---a('*)(zd) l<i<n. 

It follows from (2.1)-(2.4) that if 

V — vec(a^^^ o • • • o a'--'^^), 

then 

V = a^"*) ® ••• ®a(^) (2.21) 

and 

v»ec(i.„) = a^'n*i)---a^''H«<i) l<i<n. (2.22) 

If p is a permutation of l:d, then from the definition of the p-transpose in (2.9) and 
the definition of ^rxc in (2.12)-(2.16) we have 

A<P> = a^P''> o ■ ■ ■ o a^P-'') (2.23) 

and 

Axe = vec(a(''i^ o • • ■ o a^'^")) • vec(a(^i' o • • • o a^^^-^^)^. (2.24) 

In other words, the unfolding of a rank-1 tensor is a rank-1 matrix. These rank-1 
facts simplify some of the proofs that follow in the next section. 

We consider another special case that relates to the multilinear product, see §4.2. 
Suppose B = o • • • o where B^'^) e ]R9'=x"'= for fc = 1, . . . , d, i.e., 

Note that B is an order-2c? tensor. If r = l:2:2d, c = 2:2:2d, and p = [r c], then 
for all i and j that satisfy 1 < i < q and 1 < j < n we have 

where a = iwec(i,q) and /3 = zi;ec(j,n). However, this is precisely the (a,/?) entry of 
the matrix ® ■ • • ® S^^). Thus, 

(b'^^'> o-.-oB'^'^A = S^'^) ® •■• (2.25) 

V / [l:2:2d]x[2:2:2rf] 

3. Block Notation and Operations. In this section we formalize the notion 
of a block tensor [IS], develop a block version of vec(-), and explain how to permute 
Atxc into a block matrix whose blocks are unfoldings of ^'s blocks. The presentation 
is simplified if we make use of multi-indexed subscripts. Suppose 

1 < i < S = [Si , . . . , Se] S = Si - ■■ Se 

1 <j <t - [ti,...,i/] T = h---tf 

To say that is the i-th component of vector v S is to say that vi = f ii,ec(i.s) • 
Similarly, if Di, . . . , Dg are square matrices and D — diag(. . . , Dj, . . .), then D is & 
block diagonal matrix whose i-th diagonal block is -Di„ec(i,s)- Finally, if C = {Cij) is 
an S'-by-T block matrix, then Cij is its (i,j)-th block, i.e., Cij = Ci„ec(i.s).it)ec(j.t) ■ 
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3.1. Tensor Blockings. We say that 

M = {m(i\ 
is a blocking for ^ e R"^'' if 



, m 



id)^ 



m 



(fe) 



,(fc) 



(3.1) 



(3.2) 



is a vector of positive integers that sums to rik for k = 1, . . . ,d. If 1 < i < b, then 



block i is the m-^'' x • • 



X rrij'^'^ tensor defined by 



(3.3) 



where the lower and upper bound vectors . . . , £^'^^ and u^^) , . . . , u^'^) are defined 
by 



+ m 



(fc) 



1 



(k) (k) , , (fe) , (fe) 



(3.4) 

*j ■■i-X -p • • • -r '"-j — i T- iiij (3.5) 

for /c = 1, . . . , d. The blocking M identifies ^ as a 6i x 62 x • • • x 6^ block tensor. The 
number of elements in each tensor block Ai turns out to be a quantity of importance 
and to that end we define the "volume function" voIm(-) by 

M 



volM(i) = m 



1< i < b. 



(3.6) 



3.2. The VecM(-) Operation. If M is a blocking of ^ e ]R"ix -x"'i given by 
(3.1)-(3.5), then vecM(w4.) is the block vector 



vecM(A 



Vl 



''b 



Vi = vec{Ai) 



(3.7) 



where 1 < i < b. In other words, vecM(.A) stacks the vec's of .A's blocks where the 
blocks are taken in the vec-order. 

To illustrate this notation in the familiar matrix case, if 

M = {m«,m(2)} = {[m«m^^)], [mf^m^^'mf ]} 

is a blocking for A G R"^^"^, then we are choosing to regard ^ as a 2-by-3 block 
matrix 



(3.8) 



All 


A12 


Ai3 ' 




A21 


A22 


A23 _ 






(2) 


(2) 





In this case, vcCm(-) and voIm(-) are given by 



vecM(^) = = ... vo1m(i) 



" -"[1,1] ' 




vec(Ai) 


V[2,l] 




vec(Ai) 


V[l,2] 




vec(.4i2) 


V[2,2] 




vec(A2) 


V[l,3] 




vec(^i3) 


. ■"[2,3] 




_ vec(A3) _ 



(1) (2) 


if i = 


[1,1] 


(1) (2) 


if i = 


[2,1] 


(1) (2) 


if i = 


[1,2] 


(1) (2) 
1712 TO2 


if i = 


[2,2] 


(1) (2) 

ml TO3 


if i = 


[1,3] 


(1) (2) 

1712 ' 


if i = 


[2,3] 
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As we mentioned in the introduction, our goal is to permute the rows and columns 
of the unfolding Arxc so that its blocks arc imfoldings of ^'s blocks. To be more 
precise, if ^ = (^i) is a block tensor our goal is to determine permutation matrices 
Pr and Pc so that 

-^RXC = -fit-^rxc-Pc (^-9) 

is a block matrix whose blocks are the matrices (.4,k)rxc- It turns out that the 
permutations Pr and Pc map "vec-of-a-tensor" to "vecM-of-a-tensor." This is not 
surprising since the rows and columns of ^rxc are vec's of reduced order block tensors, 
see (2.17)-(2.20). 

Theorem 3.1. Suppose M = {m^^), . . . , m^'^)} is a blocking of A e R"-ix-><"d 

with 

mW = [m(^\...,mlf] k=l,...,d. 

For k = 1, . . . ,d set 

Nk = ni - ■■ Hk, 
M, = {m«,...,mW}, 



and define 



r iffc = l 

Qk = < (3.10) 

[ /^^/„^®rw ifi<fc<d 



where N^/N^ = nk+ink+2 ■■■rid, 
and 



rw = diag(r('=\...,rW) (3.11) 



r^.'=^ = diag(...,n , (.■),...)■ n ^ 1< i < b(l:A;- 1). (3.12) 

The permutation matrix P^ defined by 

= Qd-- -(32(31 

has the property that 

vecM{A) = PMvec(>l). 



Proof. Since both vec(-) and vecM(-) are linear operators and any tensor is the 
sum of rank-1 tensors, it suffices to prove the theorem for the case 



A = a(i) o . 

where each a^'^^ G M"* is blocked as follows: 



(fe) 



o a 



id) 



(fe) 



}m\ 



1 (fe) 
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We proceed by induction noting that the theorem is true if d = 1 because in that 
case, vcCm(^) = vec(^). Assume that the theorem holds for block tensors with order 
d — 1 or less with d> 1. Define 

M = Md-i 
b = b(l:d-l). 

and observe that M is a blocking for A, an order- (d — 1) tensor. It follows by induction 

that 



vecgi{A) = Pg,yec{A). 
Prom the definition of vecM(-) in (3.7), we have 

vi 

vecsi(^) = : Vi = af~^^' 



'0^ 



(3.13) 



(3.14) 



for all i that satisfy 1 < i < b. Equation (2.21) says that 

vec{A) = ® (a(''-i) ® ••• Oa^^^) = a^'^'^ (E> vec{A), 

and so 





- (d) - 






(J„, ® Psi)vec(^) = a^^^^v = 




®v = 





Using (2.8) we have for j = 1, . . . , 6^ that 



rf (af 



where 



rf = diag (n^^,_(^^ . . . , n^„i_(b(l:d-l)),mf.<'>) ■ ^mf\ 

■ fed 



Thus, if r('^) = diag(ri''\ . . . , vf^), then 



p(d) 



Ad) 



= vecM(^)- 



(3.15) 



(3.16) 
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Combining this equation with (3.15) we have 

rW(/„, ®P£,)vec(^) = vec^(^) 
and so Pm = T^'^Klm ® i^iCi)- But by induction 

-Pm = Qd-i •■ •Q2Q1 

where 

( In. , if = 1 



Qk 



In,_,/n^ ® rc^) if i< < - 1 



It follows that 

PM = r('')(/„,s§p^) 

= ® r(''))(/„,/„,_, ® r(''-i)) . . . ® r(2))(j,j 

= Qd <3d-l ■■ ■Q2Q1 

completing the proof. □ 

The permutation Pm has a particularly simple form if the blocking is uniform in each 
dimension. 

Corollary 3.2. Suppose M is defined by (3.1)-(3.5). // 

(fe) (fe) 

Nk=ni---nk 
Bk = bi ■ ■ - bk 
Dk = Hi ■ ■ ■ IJ'k 

for k = 1,. . . ,d, then Pm = Qd - ■ ■ Q2Q1 where 

I h,.,;.,®n,,,,,_,®in,_, i{i<k<d 



Qk 



Proof. Observe that volMfe_i(i) = Mi ' ■ 'Mfc-i- It follows from the definition of 
rf^ in (3.12) that 

Using the well-known Kronecker product identity 
it follows that 



Block Tensor Unfoldings 11 

See [TTl. From (3.10) we have 
and so 

This completes the proof. □ 

It is interesting to note that the transition from vec(y^) to vecM(-4) via the se- 
quence 

Q2-vec(^) Q3-(Q2-vec(^)) > Qd-{Qd-i ■ ■ ■ Q2 ■ vec{A)) 

is actuahy a sequence of transpositions. To illustrate, assume A G ^"2 xns xn4 ^^^^ 
define the order-8 tensor ^^^^ by 

A{ii,i2,i3,U) = A''^\Si, Pi, S2, 132, S3, 133,64,, 134) 

where 1 < i < n and the Sk and are uniquely defined by 

ik = 4 + {/3k - l)&fe 1 < 4 < A^fc- 

This says that A^^'> G ]RA'ix6ixp2x&2xmx63x,.4x64^ the d = 4 case, the Q-matrices 
in Corollary 3.2 are given by 

Note from Lemma 2.1 that these permutations correspond to transpositions. Indeed, 
if we define the tensors ^(2) ^ ^(3) _^(4) 

A^^HSu52, Pi, 132,63,133,64, 134) ^ 

A^^\6i,62,63, Pi, [32, 133,64,^4) i = A^'\6i,Pi, 62,^2, S3, P3,S4,P4) 

A^^H6i,62,63,64,Pl,P2,P3,Pi) J 

then it can be shown via Lemma 2.1 that 

vec{A^^^) ^ Qivec{A) = vec{A) 

vec(y^(2)) = g2vec(^(i)) 

vec 

(^(3)) = g3vec(^(2)) 
vec„(^) = vec(^(*') = g4vec(^(3))_ 

Thus, the order-8 tensor A^^"^ has the property that vec(^'^'^^) = vcCmC-^). Moreover, 
^(i) ~ Ai^{S) showing that entry i is entry S of block /3. 
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3.3. Block Unfoldings. We now specify the permutation matrices Pr and Pc 
in (3.9) that turn ^rxc into a block matrix with block entries that are r x c unfoldings 

of An blocks. 



Theorem 3.3. Suppose M = {m^^', . . . ,mW} is a blocking of A 



with 



mW = [m^^\...,mtj] k = l,...,d. 



bci--- bc4- 



Let e be an integer that satisfies 1 < e < d and assume that p is a permutation of l:d. 
Define 

r = p(l:e) R = {m^^^), . . . , m('^=)} = b^ 

c = p{e + l:d) C = {m(«i),...,m(«<'-e)} B^„u 

The matrix 

is a Bj-ows-by-Bcois block matrix whose block entries are specified by 

(^RXc)k(r),k(c) = {^^)rxc 1 < k < b. 

That is to say, if n = wec(k(r), b(r)) and r = wec(k(c), b(c)), then the {^i,t) block 
of At^xc is the r X c unfolding of the h-th block of A. 

Proof. By linearity there is no loss of generality in assuming that 

A = a(i) o ■ • . o a^'') 
where each a^'^^ € M"'' is blocked as follows: 



(3.17) 



(fe) 



(fe) 



(fc) 



Prom (2.24) we know that 



vec( a 



(ri) 



Since R is a blocking for a^''^-* o • • • o a(''=) and C is a blocking for a^'^^^ o • • • o a*^'^<^-=\ 
it follows from Theorem 3.1 that 

PrA^xcPc ~ 

where y = vecR,(a''"i' o • • • o a'^''^^) and z = yecc{a^'^^^ o ■ ■ ■ o a'^'^'^-'^). These block 
vectors are specified by 



yi 



2/b(r) 



yi = vec(a^ ' o • • 



1 < i < b(r) (3.18) 



z 



- 1 



^b(c) 



Zi = yec(a) ' o • 



Jd — t 



) l<j<b(c) (3.19) 



Block Tensor Unfoldings 
and so the (i, j)-th block of ^rxc is given by 

(^Rxc)ij = Vizf. 

On the other hand, from (3.17) 

(^k)rxc = °---°4?) 

\ / rxc 

/ (ri) (re)\ / (ci) 

= vec [al ' o ■ ■ ■ o ai ' ] ■ vec ai. o 
It follows from (3.18)-(3.20) that if i = k(r) and j = k(c), then 
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(3.20) 



(A)r 

which completes the proof. □ 



VizJ = (^Rxc)i,j 



To illustrate the theorem, suppose A is 2-by-4-by-3-by-2 block tensor. If r 
and c = [2 4], then 



= [1 3] 



-^1111 


-4i211 


-4l311 


-4l411 


-4lll2 


-4l212 


-4l312 


-4l412 


(1,1) 


-^2111 


-4.2211 


-42311 


-42411 


-42112 


-42212 


-42312 


-42412 


(2,1) 


-^1121 


-4.1221 


-4l321 


-4l421 


-4ll22 


-4l222 


-4l322 


-4l422 


(1,2) 


-4.2121 


-42221 


-42321 


-42421 


-42122 


-42222 


-42322 


-42422 


(2,2) 


-4ii3i 


-4l231 


-4l33l 


-4l431 


^1132 


-4l232 


^1332 




(1,3) 


-4,2131 


-42231 


.^2331 


^2431 


-42132 


^2232 


^2332 


^2432 


(2,3) 


(1.1) 


(2,1) 


(3,1) 


(4,1) 


(1,2) 


(2,2) 


(3,2) 


(4,2) 





where ^a^-y5 = {Aa^js)rxc- Note the multi-indexing of the block rows and columns. 
3.4. A Special Case. Returning to the second example in §2.4, suppose 

where 

ior £ = 1, . . . ,d. Assume that [u'^\ v'^^] is a blocking for B^^^ and note that 

M = {uW,vW,...,u(''),v('')} (3.21) 

is a blocking for B. Let Bj)^)r denote block r) of B^^^ . If 

k = [ii,ji,---,id,jd] 
then the k-th block of B is given by 



Bk = B^.^o...oB^^ 



I'd, 3d' 
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If 

r = l:2:2d 
c = 2:2:2d 

R= {u(^),...,u('^)} (3.22) 

C = {v(i),...,v('^)}, (3.23) 

then by applying (3.17) and (2.25) we see that 

(B.xc)., = (SS, o . . . o <y = b5 ^. (3.24) 

Here, the notation (SR,xc)ij denotes block (wec(i, q), wec(j, n)). This result is key 
to the development of a block-level multilinear product which we pursue in §4.2. 

4. Blocked Contractions. Wc next apply our block tensor "technology" to the 
problem of computing a contraction between two tensors. A multi-index summation 
notation will be used to describe the summations. If n is a length-d index vector, 
then 

n 111 rid 

E-E- 

i=l ii=l i<i = l 

4.1. The General Case. It is instructive to work through a small, motivating 

example before wc present the main results. Suppose wc arc given £ ]pjaix---xa4 
and e IR'^i'' -'''^^ and wish to compute the order-5 tensor U € ]R"3xa4x/33x/34x/35 
defined by 

y-{il,i2,jl,j2,h) = E -^(*l'*2.^1'^2) • ^(fcl,fc2,jl,j2,i3). (4.1) 

ki = l k2 = l 

Of course, for this to make sense, we must have = and 014 = (32- It is well known 
that a tensor contraction such as this can be "reshaped" into a single matrix-matrix 
multiplication. To see this we rewrite (4.1) using multi-index notation, 

c«(3:4) 

n{i,3) = Yl ^(i,k)-e(k,j). (4.2) 

k=l 

Define the index vectors 

r=[12] A = [34] ip = [12] c=[345] 

and note that 1 < i < a(r) and 1 < j < /3(c) in (4.2). Recall from (2.17)-(2.20) that 
the rows and columns of a tensor unfolding are vecs of reduced-order subtensors. In 
particular 



-FrxA(i,:)=vec(^('))^ 
e^xc(:,j) = vec(g(j)) 
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where T^'^ e ]R"=^^"'' and ^(j) £ R^i^'^^ are defined by 

J'(')(fci,fc2) = -F(ii,i2,fci,fc2) i = [1112] 

5(j)(fci,fc2) = Q{kl,k2,jl,j2,j3) j = [jlj2j3]- 

It follows from (4.2) that 

ai Q2 
ki = l k2 = l 

and thus 

'W[12]x[345] — J'rxX G-ipxc- 

In this example, the summation is over the last two modes of J- and the first two 
modes of Q. These are convenient locations for the summation indices because the 
contraction T-L is then easily seen to be "isomorphic" to a matrix-matrix product of 
simple tensor unfoldings. 

If the summation modes are arbitrarily positioned, then they can be moved to 
these friendly locations through transposition. This result is widely known and ex- 
ploited, e.g., [3J[TT]. Nevertheless, in keeping with the spirit of this paper we think 
that it is useful to include a formal verification of this important maneuver. 

Theorem 4.1. Suppose T eTEC^'' ' , G G ]R/3ix -x/3g+,^ ^^^^ ^ ^ 
are permutations ofl:f+£ and l'.g+£ respectively. Define 

r - p(l:/) A - p((/ + l):(/+^)) 
^ = q{l:i) c = q((^+l):(^+5)) 

and assume cx{X) ^ f3{tl>). /y-^ ^ j^"--! ^/^^j x - x/S^g is defined by 

a(A) 

= ^ J-<P>(i,k)(7<i>(k,j) l<i<a(r), l<j</3(c), (4.3) 

k=l 

then 

■W[l:/]x[/+l:/+g] = -7>XA-^Vxc- (4.4) 

Proof. The assumption a.{X) — (3{'4)) ensures that the summations in (4.3) are 
weU defined. Using (2.17)-(2.20) we have 

J-rxA(i,:)=vec(.F«)^ 
^Vxc(:J) = vec(C/(-'^) 
where J'(') e j^"ai x-xa^^ ^ x-x/3^, defined by 

.F«(k) - J■<P>(^l,...,^/,/cl,...,fc,) 
e"Hk) = r'>>(fci,...,fc,,ji,...,j,). 
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It follows that for all i and j that satisfy 1 < i < a(r) and 1 < j < /3(c) we have 

«(A) 

?^(i,j)= 5^ J-<P>(i,k).e<^>(k,j) 

k=l 
«(A) 

= ^ J-«(k).e(J)(k) = 7VxA(i,:)-a^xc(:,j) 

k=l 

which, using (2.14)-(2.18), implies (4.4). □ 

It is instructive to illustrate what the theorem "says" when c = 0. Suppose 

T e K"ix -xa5 and Q e R^^""^^ with as = l32, as = If the tensor n e R«5xaixa4 
is defined by the contraction 

a(2:3) 

"^(^1,^2, is) = ^ J^{i2,ki,k2,i3,ii)G{k2,ki), 

k=l 

then in the notation of the theorem we have f = 3,£ = 2,g = 0, p = [51423], and 
q = [ 2 1 ]. It follows that r = [514], c = 0, A = [23], and V = [21]. Thus, 
we may conclude from (4.4) that 

'W[1:S]X0 = vec('H) = ^[5 1 4]x[2 3] • S[2 I]x0 = -^[5 1 4] X [2 3) ' vec(^^) , 

a matrix- vector product. 

If the tensors T and G are "blocked conformally" , then (4.3) can be reformulated 
as a product of two block matrices. 

Corollary 4.2. Assume that the notation and conditions of Theorem 4-1 hold. 

Let 

S = {s«,...,s(/+^)} (4.5) 

be a blocking for J- and set 

R = {s('^^),...,s('-^)} A = {s(^^),...,s(^^)}. 

Likewise, let 

be a blocking for Q and set 

* = {t('^i),...,t('''')} C = {t('=^\...,t('=«)}. 

If 

s(Afc)^^(VO k = !,...,£ (4.7) 

then with respect to the tensor %, R is a blocking for modes 1 through f , C is a 
blocking for modes / + 1 through f + g, and 

^RXC — ^ RXA ■ S*XC- ('^•8) 
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Proof. Prom Theorem 3.3 we have 
Since {s^''^^ . . . , s^''-^^ t^''^^ . . . , t^'^f)} is a blocking for H we also have 

"Hrxc = Pr • H[i:/]x[/+l:/+g] -Pc- 

The conformabiUty condition (4.7) impHes P^ = -P* and so it follows from (4.4) that 

^Rxc — Pni-^ rxX ' Gil)Xc)Pc 

— {Pn-^ txA-Fa )(P*^i/;xc-Pc ) ~ rxa ' ^*xc 
completing the proof. □ 

Thus, the tensor "H in (4.3) can be computed as either a matrix product (4.4) or as a 
block matrix product (4.8). For the latter case, we develop recipes for the blocks of 
"Hrxc- Let b^j^^ be the length of the blocking vector s^-') in (4.5) and let b^^^ be the 
length of the blocking vector t*^-'-' in (4.6). Note that if 



^rows — 


Uri 




b^^} = 

cols 


Al 


Af 


AG) _ 
^rows — 




0^, 


AS) _ 

"cols — 


"Cl 





then (4.7) implies b^^J^ = bf-ows and we observe that 



( axA 




1 0AXC 


> is a < 


I ^RXC > 





bv6<^^ 

'-'rows uy ^cols 
'-'rows '-'J ^cols 

iS) 
cols 



"rows 'Jy 



block matrix. 



If 1 < /X < b(^)(r) and 1 < T < b('^)(c), /x = ivec{nM^H^)) andr = TOec(r,b(^)(c)), 
then block (/x, r) of Hrxc is given by 



("^Rxc)^^,. 



b<«'(A) 

(-^RXA)^_q (5*Xc)q_T- ■ 

q=l 



Using (3.17) this can be rewritten in terms of subtensor unfoldings. Indeed, if index 
vectors k, and j^''' are defined by 



k(r) 


= M 


k(c) 


= T 


i(q)(r) 


= k(r) 


i(i)(A) 


= q 


j(^)(V.) 


= q 




= k(c) 



then 



(^k)M 



[l:/]x[/+l:/+s] 



bl="(X) 
q=l 



l/'XC ' 



(4.9) 
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4.2. Blocked Multilinear Products. As an example of how the preceding 
resuhs can be adapted to handle structured contractions, we briefly consider the 
multilinear product since we have developed the supporting formulae in §2.4 and 
§3.4. Suppose A G ]R"ix -x"<i and that 

S(fc) e ]R'3'=X"i- k = l,...,d. 
The tensor C (E M"'''"'"^'' specified by 

n 

C(i) = ^^(k)B(i)(ii,fci).-.i3('^)(irf,fcrf) (4.10) 

k=l 

is the multilinear product of A with B'-^\ . . . , B^'^^ and is denoted [5] by 

c = (s(i),...,b('*))-a 

If the order- (2c?) tensor B is defined by 

B = pWo-.-opW, 
then we see that C is a contraction of the form 

n 

C(i) - ^^(k)S(ii,fci,...,M,fed). 

k=l 

We apply Theorem 4.1 with I' = B, f = d, £ ^ d, G ^ A, g = 0, r = l:2:2d, A = 
2:2:2d, V = ^'d, and c = 0. It follows that Ai/jxc = vec(^) and C[i:i]x[e+i:i] = vec(C) 
and so from Theorem 4.1 and (2.25) we have 

vec(C) = (^B^'') «) ••• 5(1)) vec(yl). (4.11) 

If the B matrices are blocked according to (3.21) and R and C are defined by (3.22)- 
(3.23), then R is a blocking for C, C is a blocking for A., and 

PRvec(C) = {P^ {b'^'^^ ® ■■■ ® pJ^ Pcvec(yt). (4.12) 

From (3.24) we see that the matrix 

Brxc = Pr {b^''^ ® P(i)) Pi (4.13) 

is a block matrix whose entries are Kronecker products. Indeed, Brxc is essentially 
the Tracy-Singh product of the P-matrices, see [TB]. Thus, from (|4.1ip - (|4.13p we have 
the following block specification for C: 

vec^(C) = S^xcvecc(^). (4.14) 

4.3. Visualization. As in block matrix computations, it is sometimes important 
to view a given blocked tensor contraction from different viewpoints. A small example 
builds an appreciation for this point. 

Suppose J^isa3x4x2 block tensor and C/isa2x3x5 block tensor such that 
the blockings in mode 3 in and mode 1 in conform. Let 'K be the 3x4x3x5 
block tensor whose elements are given by 

'^(il,«2,jl,j2) ^ J'(ii,i2,fc) • ^(fc, ji,j2). 
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./ ./ ./ 




(1) The tensor contraction 'H = * G oi two 
ordcr-3 tensors viewed graphically as a contrac- 
tion of conformally blocked tensors. 



(2) BlockHabcd = 'H(ai:a2,/3i:/32, 71:72, 5i:52) 
is a it-contraction of two "block fibers", one 
from and one from Q, i.e. Habcd = 



(3) The ^-contraction of the two block fibers 
is a sum of ^-contractions of fiber blocks, 

i.e. Habcd = ^abl * Glcd + J^ab'Z * Q2cd- 



Fig. 4.1. Three Levels of a Blocked Contraction 





/ 


/ 










+ 










/ 



For convenience, denote the operation of contracting two order-3 tensors 7i and Ti in 
this way as Tx-kT^-, e.g., T-L = J- * Q . Fig 4.1 shows how this blocked contraction can 
be visuahzed at three different levels. At the lowest level, block [a, &, c, d] in T-L can be 
computed via the matrix equation 

{T~l.abcd)[l 2]x[3 4] = {^abl)[l 2] X [3] • (01cd)[l]x[2 3] + {^ab2)[l 2] X [3] • (^2cd) [1] x [2 3] ■ 

This follows from (|4.9p and is depicted in part (3) of Fig 4.1. 

5. Concluding Remarks. Given the nature of this paper, it is important to 
be reminded in this closing section that there is a big difference between a cryptic 
mathematical formula and its utilization in practice. A case in point is the permu- 
tation matrix that is characterized in Theorem 3.1. Obviously, an integer vector 
should be used to represent a permutation matrix like Pm! it should never be com- 
puted as a two-dimensional array. We offer a few details based on the convention that 
if P = /„(:,v) where v is permutation of l:n, then v represents P. We capture this 
connection with the notation P^. Note that if j/ = P^x, then y = a;(v) while y(v) = x 
implies y = P^ x. Letting 1„ denote the n-vector of ones, here are some basic facts 
that concern this style of representation: 

1. If q and r are positive integers and w — [l:r:qr 2:2:qr ■■■ r:r:qr], then 
Pw = ^q,r, the (<7, r) perfect shuffle. 

2. If u and v arc permutations of l:n and w = v(u), then Pw = PuPv 

3. If u is a permutation of l:n and v is a permutation of l:m, then P^ = Pu ® Pv 
where w — In ® v + m ■ {u — 1„) ^ 1,„. 

4. If u is a permutation of l:n and v is a permutation of l:m, then P^ = 
diag(Pu,Pv) where w = [u (n-l„ -I- v)] 

The vector representation of the matrix Pm, since it is defined by perfect shuffles, 
Kronecker products, and direct sums, can be efficiently assembled using these facts. 
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Another illustration of the gap between formula and implementation concerns 
equation (4.11). The calculation of a multilinear product C = {B^-^\ . . . , B'^'^^) ■ A 
would not explicitly use this formula. Instead it would proceed as follows: 

for i = 1, . . . ,d 

A i (^Ijii : • ■ • ^ \ . . . , Iji^^ • A 

end 

The i-th update is referred to as the i-mode product, see [H [T3]- By using Theorem 
4.1 we see that this is equivalent to the matrix-matrix multiplication 

where A/^i) = ^[i]x[i:i-i i+i:d\ is the mode-i unfolding of A mentioned in §2.3. 

Similarly, in a block-based implementation of the multilinear product, one would 
not directly use (|4.14p . Instead, the block-matrix multiplications 

Axe B^'Uixc 

would be carried out sequentially for modes i — l,...,d. Here, I is the original 
blocking for mode i, J is the new blocking of mode i inherited from the row blocking 
of BW, and C is a blocking for modes [l:j — 1 i + l:d] of A. 

Overall, it is reasonable to conclude from the above that block tensors behave 
in much the same way as block matrices. Although the precise formulas are more 
involved, the basic intuition that "all operations can be done at the block level" is 
correct. By making precise the notion of a block unfolding and developing a framework 
for reasoning about block tensor computation, we hope that we have laid a modest 
foundation for further research. Our own agenda includes looking at block versions 
of the tensor contraction engine [5] , developing recursive tensor data structures that 
extend the clever ideas in l3j, expanding the functionality of the Tensor Toolbox 
[111 112] so that it supports block tensor computation, and analyzing block versions 
of various tensor iterations such as [5]. Throughout all this it will be important to 
chip away at the "notational divide" that currently besets the tensor computation 
community, see [5]. 
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